53 research outputs found

    Using Incomplete Information for Complete Weight Annotation of Road Networks -- Extended Version

    Full text link
    We are witnessing increasing interests in the effective use of road networks. For example, to enable effective vehicle routing, weighted-graph models of transportation networks are used, where the weight of an edge captures some cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions or travel time. It is a precondition to using a graph model for routing that all edges have weights. Weights that capture travel times and GHG emissions can be extracted from GPS trajectory data collected from the network. However, GPS trajectory data typically lack the coverage needed to assign weights to all edges. This paper formulates and addresses the problem of annotating all edges in a road network with travel cost based weights from a set of trips in the network that cover only a small fraction of the edges, each with an associated ground-truth travel cost. A general framework is proposed to solve the problem. Specifically, the problem is modeled as a regression problem and solved by minimizing a judiciously designed objective function that takes into account the topology of the road network. In particular, the use of weighted PageRank values of edges is explored for assigning appropriate weights to all edges, and the property of directional adjacency of edges is also taken into account to assign weights. Empirical studies with weights capturing travel time and GHG emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark) offer insight into the design properties of the proposed techniques and offer evidence that the techniques are effective.Comment: This is an extended version of "Using Incomplete Information for Complete Weight Annotation of Road Networks," which is accepted for publication in IEEE TKD

    Document Simplicial Complex

    Get PDF
    A k-simplex is de�ned as k-dimensional geometric structure which is the convex hull of k+1 points. Given k+1 points x0; :::; xk 2 Rk which are a�nely independent, the set C = ( a0x0 + ::: + akxk ���� Xk i=0 ai = 1 and ai � 0 for all i ) ; is de�ned as the k-simplex determined by them. Simplex is a very basic building structure in abstract topology. Collection of simplexes (or simplices) under certain condition is called geometrical simplicial complex, which further helps to analyze a geometrical structure on bigger scale. An abstract simplicial complex is a purely combinatorial description of the geometric notion of a simplicial complex, consisting of a family of non-empty �nite sets closed under the operation of taking non-empty subsets. A text document can be visualized as a geometric structure in topology. A docu- ment is de�ned as a collection of words, where each word is considered to be a part of vocabulary having a certain meaning. And an n-gram is a contiguous sequence of n items from a given sample of text. Using the n-gram concept to de�ne a simplex we can construct an abstract simplicial complex out of every text document. Thus from this model, every simplex catches the local structure or behavior while a document simplicial complex, which is the collection of all n-1 simplex, captures the global be- havior of the document. We will study this considering we have a bag of documents i.e. the universal set of documents. The aim of this thesis is to understand abstract structure admitted by text doc- uments to �nd more accurately the similar documents from the given family if text documents. In our discussion, we will visualize a document as a geometrical entity and will make use of such representation of a text document to fast the process of querying, where given a query document one can �nd the semantically similar doc- uments more e�ciently in the sense of time and similarity. For example, given a set of documents as f1.\after clearing high school one joins college", 2.\College can be joined only after passing high school" and 3.\High school and college must be attended by everyone"g the document 1 and 2 are more semantically similar that 1 and 3 or 2 and 3. After a brief glance at abstract topology, we study the topological structure and behavior of text documents. A novel representation of documents is given in this thesis. Using this new structure of a text document we represent each document as a geometrical entity which further can be analyzed using topological tools. Using Earth Mover's distance and Hausdor� distance we give a new formulation to fetch semantic documents for a given query. To represent documents as a mathematical structure in some Rk, we use Word2Vec model to �nd vector representation of each word in a text document

    Self-Supervised Few-Shot Learning on Point Clouds

    Full text link
    The increased availability of massive point clouds coupled with their utility in a wide variety of applications such as robotics, shape synthesis, and self-driving cars has attracted increased attention from both industry and academia. Recently, deep neural networks operating on labeled point clouds have shown promising results on supervised learning tasks like classification and segmentation. However, supervised learning leads to the cumbersome task of annotating the point clouds. To combat this problem, we propose two novel self-supervised pre-training tasks that encode a hierarchical partitioning of the point clouds using a cover-tree, where point cloud subsets lie within balls of varying radii at each level of the cover-tree. Furthermore, our self-supervised learning network is restricted to pre-train on the support set (comprising of scarce training examples) used to train the downstream network in a few-shot learning (FSL) setting. Finally, the fully-trained self-supervised network's point embeddings are input to the downstream task's network. We present a comprehensive empirical evaluation of our method on both downstream classification and segmentation tasks and show that supervised methods pre-trained with our self-supervised learning method significantly improve the accuracy of state-of-the-art methods. Additionally, our method also outperforms previous unsupervised methods in downstream classification tasks.Comment: Accepted at NeurIPS 202

    BERTops: Studying BERT Representations under a Topological Lens

    Get PDF
    Proposing scoring functions to effectively understand, analyze and learn various properties of high dimensional hidden representations of large-scale transformer models like BERT can be a challenging task. In this work, we explore a new direction by studying the topological features of BERT hidden representations using persistent homology (PH). We propose a novel scoring function named 'persistence scoring function (PSF)' which: (i) accurately captures the homology of the high-dimensional hidden representations and correlates well with the test set accuracy of a wide range of datasets and outperforms existing scoring metrics, (ii) captures interesting post fine-tuning 'per-class' level properties from both qualitative and quantitative viewpoints, (iii) is more stable to perturbations as compared to the baseline functions, which makes it a very robust proxy, and (iv) finally, also serves as a predictor of the attack success rates for a wide category of black-box and white-box adversarial attack methods. Our extensive correlation experiments demonstrate the practical utility of PSF on various NLP tasks relevant to BERT11Code is available at https://github.com/chauhanjatin10/BERTops © 2022 IEEE

    Improving Data Quality by Leveraging Statistical Relational Learning

    Get PDF
    Digitally collected data su ↵ ers from many data quality issues, such as duplicate, incorrect, or incomplete data. A common approach for counteracting these issues is to formulate a set of data cleaning rules to identify and repair incorrect, duplicate and missing data. Data cleaning systems must be able to treat data quality rules holistically, to incorporate heterogeneous constraints within a single routine, and to automate data curation. We propose an approach to data cleaning based on statistical relational learning (SRL). We argue that a formalism - Markov logic - is a natural fit for modeling data quality rules. Our approach allows for the usage of probabilistic joint inference over interleaved data cleaning rules to improve data quality. Furthermore, it obliterates the need to specify the order of rule execution. We describe how data quality rules expressed as formulas in first-order logic directly translate into the predictive model in our SRL framework

    Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

    Get PDF
    The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention-based feature embedding that captures both entity and relation features in any given entity’s neighborhood. Additionally, we also encapsulate relation clusters and multi-hop relations in our model. Our empirical study offers insights into the efficacy of our attention-based model and we show marked performance gains in comparison to state-of-the-art methods on all datasets
    corecore